This article concerns Bayesian inference using deep linear networks with output dimension one. In the interpolating (zero noise) regime we show that with Gaussian weight priors and MSE negative log-likelihood loss both the predictive posterior and the Bayesian model evidence can be written in closed form in terms of a class of meromorphic special functions called Meijer-G functions. These results are non-asymptotic and hold for any training dataset, network depth, and hidden layer widths, giving exact solutions to Bayesian interpolation using a deep Gaussian process with a Euclidean covariance at each layer. Through novel asymptotic expansions of Meijer-G functions, a rich new picture of the role of depth emerges. Specifically, we find that the posteriors in deep linear networks with data-independent priors are the same as in shallow networks with evidence maximizing data-dependent priors. In this sense, deep linear networks make provably optimal predictions. We also prove that, starting from data-agnostic priors, Bayesian model evidence in wide networks is only maximized at infinite depth. This gives a principled reason to prefer deeper networks (at least in the linear case). Finally, our results show that with data-agnostic priors a novel notion of effective depth given by \[\#\text{hidden layers}\times\frac{\#\text{training data}}{\text{network width}}\] determines the Bayesian posterior in wide linear networks, giving rigorous new scaling laws for generalization error.
translated by 谷歌翻译
For small training set sizes $P$, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network (NN), either in the kernel or mean-field/feature-learning regime. However, after a critical sample size $P^*$, we empirically find the finite-width network generalization becomes worse than that of the infinite width network. In this work, we empirically study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$. We find that finite-size effects can become relevant for very small dataset sizes on the order of $P^* \sim \sqrt{N}$ for polynomial regression with ReLU networks. We discuss the source of these effects using an argument based on the variance of the NN's final neural tangent kernel (NTK). This transition can be pushed to larger $P$ by enhancing feature learning or by ensemble averaging the networks. We find that the learning curve for regression with the final NTK is an accurate approximation of the NN learning curve. Using this, we provide a toy model which also exhibits $P^* \sim \sqrt{N}$ scaling and has $P$-dependent benefits from feature learning.
translated by 谷歌翻译
The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and a lack of available datasets has stymied progress. We address this gap with PulseImpute, the first large-scale pulsative signal imputation challenge which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. Our baseline models include a novel transformer-based architecture designed to exploit the structure of pulsative signals. We hope that PulseImpute will enable the ML community to tackle this significant and challenging task.
translated by 谷歌翻译
Deep generative models parametrized up to a normalizing constant (e.g. energy-based models) are difficult to train by maximizing the likelihood of the data because the likelihood and/or gradients thereof cannot be explicitly or efficiently written down. Score matching is a training method, whereby instead of fitting the likelihood $\log p(x)$ for the training data, we instead fit the score function $\nabla_x \log p(x)$ -- obviating the need to evaluate the partition function. Though this estimator is known to be consistent, its unclear whether (and when) its statistical efficiency is comparable to that of maximum likelihood -- which is known to be (asymptotically) optimal. We initiate this line of inquiry in this paper, and show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated -- i.e. the Poincar\'e, log-Sobolev and isoperimetric constant -- quantities which govern the mixing time of Markov processes like Langevin dynamics. Roughly, we show that the score matching estimator is statistically comparable to the maximum likelihood when the distribution has a small isoperimetric constant. Conversely, if the distribution has a large isoperimetric constant -- even for simple families of distributions like exponential families with rich enough sufficient statistics -- score matching will be substantially less efficient than maximum likelihood. We suitably formalize these results both in the finite sample regime, and in the asymptotic regime. Finally, we identify a direct parallel in the discrete setting, where we connect the statistical properties of pseudolikelihood estimation with approximate tensorization of entropy and the Glauber dynamics.
translated by 谷歌翻译
现在,可以使用最先进的神经语言模型通过零射门提示来解决临时语言任务,而无需进行监督培训。近年来,这种方法已广受欢迎,研究人员证明了提示在特定的NLP任务上实现强烈准确的提示。但是,找到新任务的提示需要实验。具有不同措辞选择的不同提示模板会导致明显的准确性差异。提示允许用户尝试及时变化,可视化及时性能,并迭代优化提示。我们开发了一个工作流程,该工作流程允许用户首先使用少量数据专注于模型反馈,然后再进入大型数据制度,该数据制度允许使用任务的定量度量来实现有希望的提示的经验基础。然后,该工具可以轻松部署新创建的临时模型。我们使用多种现实世界用例演示了Fackide(http://prompt.vizhub.ai)和我们的工作流程的实用性。
translated by 谷歌翻译
近年来,人类面孔的影子化化身已经走了很长一段路,但是该地区的研究受到缺乏公开可用的高质量数据集的限制。在这项工作中,我们介绍了Multiface,这是一种新的多视图,高分辨率的人脸数据集,该数据集是从13个身份的神经面部渲染研究中收集的13个身份。我们介绍了Mugsy,这是一种大型多摄像机设备,可捕获面部表现的高分辨率同步视频。 Multiface的目的是缩小学术界高质量数据的可访问性的差距,并使VR触觉研究能够进行研究。随着数据集的释放,我们对不同模型体系结构对模型的新观点和表达式的插值能力进行消融研究。通过有条件的VAE模型作为我们的基线,我们发现添加空间偏见,纹理翘曲场和残差连接可改善新型视图合成的性能。我们的代码和数据可在以下网址获得:https://github.com/facebookresearch/multiface
translated by 谷歌翻译
COVID-19的大流行提出了对多个领域决策者的流行预测的重要性,从公共卫生到整个经济。虽然预测流行进展经常被概念化为类似于天气预测,但是它具有一些关键的差异,并且仍然是一项非平凡的任务。疾病的传播受到人类行为,病原体动态,天气和环境条件的多种混杂因素的影响。由于政府公共卫生和资助机构的倡议,捕获以前无法观察到的方面的丰富数据来源的可用性增加了研究的兴趣。这尤其是在“以数据为中心”的解决方案上进行的一系列工作,这些解决方案通过利用非传统数据源以及AI和机器学习的最新创新来增强我们的预测能力的潜力。这项调查研究了各种数据驱动的方法论和实践进步,并介绍了一个概念框架来导航它们。首先,我们列举了与流行病预测相关的大量流行病学数据集和新的数据流,捕获了各种因素,例如有症状的在线调查,零售和商业,流动性,基因组学数据等。接下来,我们将讨论关注最近基于数据驱动的统计和深度学习方法的方法和建模范式,以及将机械模型知识域知识与统计方法的有效性和灵活性相结合的新型混合模型类别。我们还讨论了这些预测系统的现实部署中出现的经验和挑战,包括预测信息。最后,我们重点介绍了整个预测管道中发现的一些挑战和开放问题。
translated by 谷歌翻译
COVID-19大流行的快速扩散导致SARS-COV-2基因组的序列数据量很大,数百万序列和计数。尽管超出传统方法的能力来理解病毒的多样性,动态和演变的能力,但这一数量的数量幅度仍然是机器学习(ML)方法的丰富资源(ML)作为从这些数据中提取此类重要信息的替代方法。因此,设计一个用于测试和基准测试这些ML模型的鲁棒性的框架至关重要。本文(据我们所知)首次努力通过使用错误模拟生物学序列来基准ML模型的鲁棒性。在本文中,我们介绍了几种方法来扰动SARS-COV-2基因组序列,以模仿普通测序平台(例如Illumina和pacbio)的误差曲线。我们从在各种ML模型上的实验中证明,对于某些特定的嵌入方法,某些基于仿真的方法比其他针对输入序列的对抗性攻击更健壮(和准确)。我们的基准测试框架可以帮助研究人员正确评估不同的ML模型,并帮助他们了解SARS-COV-2病毒的行为或避免未来可能的大流行。
translated by 谷歌翻译
在潜在的强盗问题中,学习者可以访问奖励分布,并且 - 对于非平稳的变体 - 环境的过渡模型。奖励分布在手臂和未知的潜在状态下进行条件。目的是利用奖励历史来识别潜在状态,从而使未来的武器选择最佳。潜在的匪徒设置将自己适用于许多实际应用,例如推荐人和决策支持系统,其中丰富的数据允许在线学习的环境模型的离线估算仍然是关键组成部分。在这种情况下,以前的解决方案始终根据代理商对国家的信念选择最高的奖励组,而不是明确考虑信息收集臂的价值。这种信息收集的武器不一定会提供最高的奖励,因此永远不会选择始终选择最高奖励武器的代理商选择。在本文中,我们提出了一种潜在土匪信息收集的方法。鉴于特殊的奖励结构和过渡矩阵,我们表明,鉴于代理商对国家的信念,选择最好的手臂会产生更高的遗憾。此外,我们表明,通过仔细选择武器,我们可以改善对国家分布的估计,从而通过将来通过更好的手臂选择来降低累积后悔。我们在合成和现实世界数据集上评估了我们的方法,显示出对最新方法的遗憾显着改善。
translated by 谷歌翻译
Adaptive partial linear beamforming meets the need of 5G and future 6G applications for high flexibility and adaptability. Choosing an appropriate tradeoff between conflicting goals opens the recently proposed multiuser (MU) detection method. Due to their high spatial resolution, nonlinear beamforming filters can significantly outperform linear approaches in stationary scenarios with massive connectivity. However, a dramatic decrease in performance can be expected in high mobility scenarios because they are very susceptible to changes in the wireless channel. The robustness of linear filters is required, considering these changes. One way to respond appropriately is to use online machine learning algorithms. The theory of algorithms based on the adaptive projected subgradient method (APSM) is rich, and they promise accurate tracking capabilities in dynamic wireless environments. However, one of the main challenges comes from the real-time implementation of these algorithms, which involve projections on time-varying closed convex sets. While the projection operations are relatively simple, their vast number poses a challenge in ultralow latency (ULL) applications where latency constraints must be satisfied in every radio frame. Taking non-orthogonal multiple access (NOMA) systems as an example, this paper explores the acceleration of APSM-based algorithms through massive parallelization. The result is a GPUaccelerated real-time implementation of an orthogonal frequency-division multiplexing (OFDM)based transceiver that enables detection latency of less than one millisecond and therefore complies with the requirements of 5G and beyond. To meet the stringent physical layer latency requirements, careful co-design of hardware and software is essential, especially in virtualized wireless systems with hardware accelerators.
translated by 谷歌翻译